186 research outputs found

    EntrezAJAX: direct web browser access to the Entrez Programming Utilities

    Get PDF
    Web applications for biology and medicine often need to integrate data from Entrez services provided by the National Center for Biotechnology Information. However, direct access to Entrez from a web browser is not possible due to 'same-origin' security restrictions. The use of "Asynchronous JavaScript and XML" (AJAX) to create rich, interactive web applications is now commonplace. The ability to access Entrez via AJAX would be advantageous in the creation of integrated biomedical web resources. We describe EntrezAJAX, which provides access to Entrez eUtils and is able to circumvent same-origin browser restrictions. EntrezAJAX is easily implemented by JavaScript developers and provides identical functionality as Entrez eUtils as well as enhanced functionality to ease development. We provide easy-to-understand developer examples written in JavaScript to illustrate potential uses of this service. For the purposes of speed, reliability and scalability, EntrezAJAX has been deployed on Google App Engine, a freely available cloud service. The EntrezAJAX webpage is located at http://entrezajax.appspot.com

    Defining bacterial species in the genomic era : insights from the genus Acinetobacter

    Get PDF
    Background: Microbial taxonomy remains a conservative discipline, relying on phenotypic information derived from growth in pure culture and techniques that are time-consuming and difficult to standardize, particularly when compared to the ease of modern high-throughput genome sequencing. Here, drawing on the genus Acinetobacter as a test case, we examine whether bacterial taxonomy could abandon phenotypic approaches and DNA-DNA hybridization and, instead, rely exclusively on analyses of genome sequence data. Results: In pursuit of this goal, we generated a set of thirteen new draft genome sequences, representing ten species, combined them with other publically available genome sequences and analyzed these 38 strains belonging to the genus. We found that analyses based on 16S rRNA gene sequences were not capable of delineating accepted species. However, a core genome phylogenetic tree proved consistent with the currently accepted taxonomy of the genus, while also identifying three misclassifications of strains in collections or databases. Among rapid distance-based methods, we found average-nucleotide identity (ANI) analyses delivered results consistent with traditional and phylogenetic classifications, whereas gene content based approaches appear to be too strongly influenced by the effects of horizontal gene transfer to agree with previously accepted species. Conclusion: We believe a combination of core genome phylogenetic analysis and ANI provides an appropriate method for bacterial species delineation, whereby bacterial species are defined as monophyletic groups of isolates with genomes that exhibit at least 95% pair-wise ANI. The proposed method is backwards compatible; it provides a scalable and uniform approach that works for both culturable and non-culturable species; is faster and cheaper than traditional taxonomic methods; is easily replicable and transferable among research institutions; and lastly, falls in line with Darwin’s vision of classification becoming, as far as is possible, genealogical

    Clonal expansion within pneumococcal serotype 6C after use of seven-valent vaccine

    Get PDF
    Streptococcus pneumoniae causes invasive infections, primarily at the extremes of life. A seven-valent conjugate vaccine (PCV7) is used to protect against invasive pneumococcal disease in children. Within three years of PCV7 introduction, we observed a fourfold increase in serotype 6C carriage, predominantly due to a single clone. We determined the whole-genome sequences of nineteen S. pneumoniae serotype 6C isolates, from both carriage (n = 15) and disease (n = 4) states, to investigate the emergence of serotype 6C in our population, focusing on a single multi-locus sequence type (MLST) clonal complex 395 (CC395). A phylogenetic network was constructed to identify different lineages, followed by analysis of variability in gene sets and sequences. Serotype 6C isolates from this single geographical site fell into four broad phylogenetically distinct lineages. Variation was seen in the 6C capsular locus and in sequences of genes encoding surface proteins. The largest clonal complex was characterised by the presence of lantibiotic synthesis locus. In our population, the 6C capsular locus has been introduced into multiple lineages by independent capsular switching events. However, rapid clonal expansion has occurred within a single MLST clonal complex. Worryingly, plasticity exists within current and potential vaccine-associated loci, a consideration for future vaccine use, target selection and design

    Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach

    Get PDF
    Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a “divide and conquer” approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/

    Genome analysis of a highly virulent serotype 1 strain of streptococcus pneumoniae from West Africa

    Get PDF
    Streptococcus pneumoniae is a leading cause of pneumonia, meningitis, and bacteremia, estimated to cause 2 million deaths annually. The majority of pneumococcal mortality occurs in developing countries, with serotype 1 a leading cause in these areas. To begin to better understand the larger impact that serotype 1 strains have in developing countries, we characterized virulence and genetic content of PNI0373, a serotype 1 strain from a diseased patient in The Gambia. PNI0373 and another African serotype 1 strain showed high virulence in a mouse intraperitoneal challenge model, with 20% survival at a dose of 1 cfu. The PNI0373 genome sequence was similar in structure to other pneumococci, with the exception of a 100 kb inversion. PNI0373 showed only15 lineage specific CDS when compared to the pan-genome of pneumococcus. However analysis of non-core orthologs of pneumococcal genomes, showed serotype 1 strains to be closely related. Three regions were found to be serotype 1 associated and likely products of horizontal gene transfer. A detailed inventory of known virulence factors showed that some functions associated with colonization were absent, consistent with the observation that carriage of this highly virulent serotype is unusual. The African serotype 1 strains thus appear to be closely related to each other and different from other pneumococci despite similar genetic content

    A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer

    Get PDF
    BACKGROUND: The MinION™ is a new, portable single-molecule sequencer developed by Oxford Nanopore Technologies. It measures four inches in length and is powered from the USB 3.0 port of a laptop computer. The MinION™ measures the change in current resulting from DNA strands interacting with a charged protein nanopore. These measurements can then be used to deduce the underlying nucleotide sequence. FINDINGS: We present a read dataset from whole-genome shotgun sequencing of the model organism Escherichia coli K-12 substr. MG1655 generated on a MinION™ device during the early-access MinION™ Access Program (MAP). Sequencing runs of the MinION™ are presented, one generated using R7 chemistry (released in July 2014) and one using R7.3 (released in September 2014). CONCLUSIONS: Base-called sequence data are provided to demonstrate the nature of data produced by the MinION™ platform and to encourage the development of customised methods for alignment, consensus and variant calling, de novo assembly and scaffolding. FAST5 files containing event data within the HDF5 container format are provided to assist with the development of improved base-calling methods

    Differences in the gut microbiota between Gurkhas and soldiers of British origin

    Get PDF
    Previous work indicated that the incidence of travellers’ diarrhoea (TD) is higher in soldiers of British origin, when compared to soldiers of Nepalese descent (Gurkhas). We hypothesise that the composition of the gut microbiota may be a contributing factor in the risk of developing TD in soldiers of British origin. This study aimed to characterise the gut microbial composition of Gurkha and non-Gurkha soldiers of the British Army. Recruitment of 38 soldiers (n = 22 Gurkhas, n = 16 non-Gurkhas) and subsequent stool collection, enabled shotgun metagenomic sequencing-based analysis of the gut microbiota. The microbiota of Gurkhas had significantly (P < 0.05) lower diversity, for both Shannon and Simpson diversity indices, using species level markers than the gut microbiota of non-Gurkha soldiers. Non-metric Multidimensional Scaling (NMDS) of the Bray-Curtis distance matrix revealed a significant difference in the composition of the gut microbiota between Gurkhas and non-Gurkha soldiers, at both the species level (P = 0.0178) and the genus level (P = 0.0483). We found three genera and eight species that were significantly enriched in the non-Gurkha group and one genus (Haemophilus) and one species (Haemophilus parainfluenzae) which were enriched in the Gurkha group. The difference in the microbiota composition between Gurkha soldiers and soldiers of British origin may contribute to higher colonization resistance against diarrhoeal pathogens in the former group. Our findings may enable further studies into interventions that modulate the gut microbiota of soldiers to prevent TD during deployment

    Reagent and laboratory contamination can critically impact sequence-based microbiome analyses.

    Get PDF
    BACKGROUND: The study of microbial communities has been revolutionised in recent years by the widespread adoption of culture independent analytical techniques such as 16S rRNA gene sequencing and metagenomics. One potential confounder of these sequence-based approaches is the presence of contamination in DNA extraction kits and other laboratory reagents. RESULTS: In this study we demonstrate that contaminating DNA is ubiquitous in commonly used DNA extraction kits and other laboratory reagents, varies greatly in composition between different kits and kit batches, and that this contamination critically impacts results obtained from samples containing a low microbial biomass. Contamination impacts both PCR-based 16S rRNA gene surveys and shotgun metagenomics. We provide an extensive list of potential contaminating genera, and guidelines on how to mitigate the effects of contamination. CONCLUSIONS: These results suggest that caution should be advised when applying sequence-based techniques to the study of microbiota present in low biomass environments. Concurrent sequencing of negative control samples is strongly advised
    corecore